West Point, SAAVB, and BBN/AUB Arabic Speech Corpora: A Comparative Survey

نویسندگان

  • Yousef Ajami Alotaibi
  • Ali Hamid Meftah
چکیده

The aim of this paper is to evaluate three public Arabic speech corpora, namely the West Point (WP), Saudi Accented Arabic Voice Bank (SAAVB) and the BBN Technologies/American University at Beirut (BBN/AUB) corpus by referring the TIMIT English speech corpus as benchmark. Weaknesses, strengths, and discrepancies of these Arabic corpora regarding their design and content are covered in this paper. This paper is very important to Arabic speech processing because Arabic is one of the under resourced language despite its importance and popularity. Currently, we are considering WP and BBN/AUB corpora to analyse and study Arabic rhythm in our ongoing research project. Keywords-Arabic language; TIMIT; West Point; SAAVB; BBN/AUB.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a Telephony Saudi Accented Arabic Corpus in Automatic Recognition of Spoken Arabic Digits

In this research, spoken Arabic digits are investigated from the speech recognition problem point of view. The system is designed to recognize an isolated whole-word speech. In the training and testing phase of this system, isolated digits data sets are taken from the telephony Arabic speech corpus, SAAVB. This standard corpus was developed by KACST and it is classified as a noisy speech databa...

متن کامل

Saudi accented Arabic voice bank

The aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets, selecting the right speakers and transcribing their speech are some of the challenges that faced the project team. The procedures that met these challenges are h...

متن کامل

Speech Recognition System of Arabic Digits based on A Telephony Arabic Corpus

Automatic recognition of spoken digits is one of the difficult tasks in the field of computer speech recognition. Spoken digits recognition process is required in many applications such as speech based telephone dialing, airline reservation, automatic directory to retrieve or send information, etc. These applications take numbers and alphabets as input. Arabic language is a Semitic language tha...

متن کامل

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Automatic recognition of spoken alphabets is one of the difficult tasks in the field of computer speech recognition. In this research, spoken Arabic alphabets are investigated from the speech recognition problem point of view. The system is designed to recognize an isolated whole-word speech. The Hidden Markov Model Toolkit (HTK) is used to implement the isolated word recognizer with phoneme ba...

متن کامل

The BBN Byblos 1997 large vocabulary conversational speech recognition system

This paper presents the 1997 BBN Byblos Large Vocabulary Speech Recognition (LVCSR) system. We give an outline of the algorithms and procedures used to train the system, describe the recognizer configuration and present the major technological innovations that lead to performance improvements. The major testbed we present our results for is the Switchboard Corpus, where current word error rates...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012